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A NOTE ON ALLOCATING ITEMS TO SUBTESTS IN MULTIPLE MATRIX SAMPLING 

i ? «* 

David .M« Shoemaker 

. * ! " • • 

Multiple matrix sampling or, more popularly:, item-exami it*e sampling, 

» 

is a procedure in which a set of K test items is subdivided randomly 

>into Jt subtests containing k items each with each subtest administered 

to n examinees selected randomly from the population of Re; 4iiinees« 

Although each, examinee receives only a proportion of'fche K test "items, 

the equations given by Hooke (1956) and Lord,, (1960) permit the, researcher 

to estimate parameters of the test score distribution which would have 

been obtained by tes tjlng all N examinees over .all K test items. Because 

numerous combinations of _t, k, and n are feasible in any investigation, 

the researcher. must coi&e to grips with several questions about how the 

procedure should be implemented* "Ifow should items be allocated to 

subtests? 11 , is one important question requiring an answer and is^the one 

addressed specifically herein; concomitantly, the feasibility of using 

the jackknife procedure for approximating standard errors of estimate 

in multiple matrix sampling is considered .in some detail, 

A basic requirement in multip'le matrix sampling is that k items 

from the Kr item 'population are allacated randomly to- each subtest. 

However, in constructing the Jt subtests , four, general item allocation 

* < * - f . ♦ 

procedures are*BPSsible--each of which is described mone appropriately 

as restricted random sampling . The four procedures and concomitant 

restrictions are listed in Table 1 and an* example of each procedure is 

given in Table 2 for \k =*3 and K - 7. » 



Procedures 1, 2, and 3 are implemented easily in practice; Proc£&ure 

4, however, ,is more difficult and. the degree of difficulty increases 

' . • ' ' ' • °- 

with indreases in K. Within the .context df the design of experiments, 

Procedures 3 and 4 are referred to, respectively, as a "partially 

balanced incomplete Block" dejsign {PBIB) and a "balanced incomplete 

block" design (BIB) . That which is "partially balanced" or "balanced 11 • 

by each design is the item pa i rings . In the BIR design, all possible 

. * ~ . - 

item pairings occur among subtests and they occur with equal frequency; 

in the PBIB designs item pairings 5 do not occur with equal frequency and,. 

indeed, some item pairs may be excluded completely* A BIB design is 

*■ > • ' * • * 

often difficult to implement because, for a given, K, no design may 

— e * - 

exist, or, if there is a design, the number of subtests required is 

excessively .large. Tb^sf limitation is most serious when K exceeds 50 

even -permit ting minor adjustments, in K to fit in available design. For 

\ 

example, when K ^ 91 and k = 10, 91 subtests would be required; for 

#/t • 

K = 97 and \ = 10, N 4656; and", for K = 199 and k = 10, 19701. The firstf 
of these three BIB designs is cited and illustrated by Cochran and Cox 
(1957); the pUaer two are given by Ramanujacharyulu (1966,) and cited by 
Knapp (1968a). Although BIB designs have been used on a few occasions 
(e.g., Knapp,. 1968a, 1968b) when K was small, (K = 43 and K = 12, 
respectively, with Knapp), *such designs are ill-syited to large item 
, populations. This point is of no minor import because one of the major 

reasons for usirijf P multiple matrix s&mpling is ifrs potential for dealing 

> n 
wiCh large item populations. Because of this, it~*is expected Oat the 

' * t 

majority of item allocation procedures in multiple matrix sampling will 

* "* 

involve Procedure's 1, 2, ot 3. * 



■ -t 

It should be noted that, in practice,. Procedures 1,. 2, apd 3 are 
. implemented typically rnr conjunction with Item stratification, that is,* 

* e 

a stratified-random sampling procedure-is used with the stratification 
being on item content, item difficulty level, or both at; em content and 
item difficulty level. The relative merit^ of such stratification 
procedures have been discussed previously (i.e., Shoemaker and Osbutn, 
1968; Kleinke, 1971) and ar§ not 'considered here. . * ■* 

Of principal interest in this investigation were the relative 
merits of Procedures 1 and 3. 'Procedure 2 was excluded because it is 
used rarely \n practice. The metric by which these two item allocation 

procedures were contrasted was the standard ^error pf estimate, 

-» 

* ' METHOD 

< 

" * The research design was one* of* post mortem item-examinee sampling, 
<?with the required data bases generated through ^--crSmputer simulation 
model described previously by Shoemaker (1972). Jn post mortem Item- 
examinee sampling^ various samples of items and examinees t are selected 
randomly frotn a data base (an item by examinee matrix) and used to 
estimate parameters .o£ tjie base froin which they have been sampled. 'The 
researcher acts as if only certain examinees have been tested over 
certain items knowing all .the while the results- which were obtained by 
testing all Examinees over all items. 

Parameters^ of tfee data base manipulated systematically 'were:- 
(a) the number of test items (K - 40, 60), (b) variance of the item 
difficulty indices ,(cj « .00^ .05), .and (c) degree of skewness iri the 
normative distribution (distributed normally, markedly -negatively- skewed) . 



When the test scores were negatively- skewed, only- a.. — 0 was used.* The 

reliability of the total, scores */as equal to ,80 for all data bases 

generated. The 9 itemrexaminee sampling plans used are listed in , 

2 %t 

Table 3. A PBIB design was used only .when a > 0 for ,a given data base, 
v v - • ' P , * * 

2 1 . - - 

When % . := * 0, ^ tera s are statistically parallel and Procedures 1 and 

3 produce equivalent" results (and all differences observed between the 
two procedures would be due to the sampling of. examinees) . \ # 

The parameters estimated were ^ (the mean test sjcore), ^.-p^ 
(the second through' fourth central moments) and o\. The equations used 

to estimate the moments of the t.esti^s core distribution were those given 

2 * 

by Lord (1960); <? was estimated through a components ^of variance 
analysis. The results of each sampling plan were replicated 50 times. , 

Of additional concern in this investigation was a continued 
examination of the feasibility of -the jackknife procedure in approximating 
standard errors of estimate in* multiple matrix sampling, A description 
of ^Jhe "jaci&nife procedure and encouraging preliminary results in this 
area are^given by Shoemaker (1972). In general, the jackknife operates* 
on a data base which has been divided into subgroups of data and ^Ives 
a mean estimate of the parameter computed over subgroups and an estimate 
of the standard error of estimate associated with this statistic* A 
basic .component of the jackknife is the pseudovalue associated with each 
subgroup which, for each subgroup, is the weighted difference between 
the statistic^ computed on all the data and the statistic computed, on the 
body of data which remains after omitting that sutigroiip.' Because the" 
j)seudovaljues ££re relatively independent of each other, .the standard 
error of the statistic is computed according to the well-known formula 



for the standard error .of a sample mean. If the statistics computed oh 

each subgroup are weighted equally, the pseudovalues reduce algebraically 
■ „ *■ * " • * ■* 

to the averages for .the subgroups* When the jackknife is applied to 

^ ■ \ 

multiple matrix sampling there aire t subgroups of data but only one 



score (the estimated parameter) for each stlbgroup with that statistic 



weighted according to the number of observations J^k acquired by that ; 
suhtest* Ihe jackknife operates on the statistics obtained from one j 
$et of t subtests and approximates the variability of the pooled, estimates 
which would have been observed over repeated replications of the design, 

^ f ^ RESULTS ' 

AH' results are reported in Tables 3 through 7. Because the entries 

in each Table are interpreted similarly, only those for one sampling plan 

*' / 

in Table 3 will be described in detail. The first three entries in the 
first rpw of Table 3 .give the parameters of the data base. In this case, 
the IteLn population consisted of 40 it'ems, the' variance of the item^ 
difficulty indices (p = proportion answering the item correctly) was . * 
equal to 0 and the test scores were distributed normalfy. Using a 
(t = 4/1 =7 10/n « 50) item-examinee sampling'plan with random allocation 
of items, 'to subtests (Procedure 1 in Table 1) and replicating the Sampling" 
plan 50 :imes/ the standard deviation of the 50 pooled^estimates of the 
mean test score on the 40-item test was equal to .4695. Fifty jackknifed 
estimates of the standard error of the mean were produced. Their mean 

o » 

was equal to .4793; their standard deviation, .2445; If the items. for 

each .subtest had been allocated using a PBIB design (Procedure 3 in 

•» • * 

Table 1),* corresponding results would *have> appeared 1 under 'PBIB 1 in' the 



first row." None are giben there because a =0 and 1 the two itenv 

: p • V 

allocation^ procedures are* equivalent. 

Looking at the results !for SE(R) across Tables 3 through 7., it is 
generally the case that, for leach sampling plan, the standard error ,of » 

estimate is less when a. PBIB design is used,, The relative magnitude of 

* / - 

this .discrepancy was greater for the mean test score ancl decreased, 
sharply for* successively higher dentral moments , Jtecaus'd several 
combinations o£ Jt and k (for* a giVen tk) occurred among sampling plans }* * 
it was possible to .examine the" effect of certain combirjati6ns on t^ie v 

a * ^ 

• • v # 

standard error .of estimate. For a given Jtk, an increase in t* resulted 
in a decrease in SE(R) when estimating the mean test spote.; for^Jthe 

second through fourth central moments, an increase in k resulted in a 

* f * 1 " 2 * " 

decrease in SE(R); and, for -no trend was discernable. 

*' • 

Regarding the j-ackKnife v , the results indicate that on-t^e ave'ragg 

it did approximate well standard errors of , estimate. A major ^exception, 

* <* 

and one noted previously by Shoemaker (1972), was found in estimating 
the standard, error of the mean test score using a PBIB design where the* 
jackknife consistently and markedly overestimated SE(R). However, the 

jackknife did approximate well the standard error here when a random 

> 

sampling design was used to allocate items to^ subtests,* Looking at the 

i 

1 - 

results across parameters, it was generally found that, when a PBIB 
design was used, tlje > jackknife overestimated standard errors of estimate* 

This* did not occur when a random sampling design (Procedure 1 in Table 1) 

j m 
was used. The relative discrepancy was mbst marked for the mean test 

> 

^core and decreased in magnitude for successively higher central moments* 



In^a manner similar to SE(R), the standard deviation of the jackknifed 

estimates of the standard error SD(J) decreased with increases in^Jt when 

estimating the standard error of the mean test score and decreased m \ 

'« ' * > s 

generally with increases in k when estiriiating the 'standard errors* o£ 

the higher central moments for' a given tkv ^. 

< • : • * 

' * * * DISCISSION 



H * The results suppott trie conclusion that £h& procedure for allocating 

/ < 
items to subtests in multiple matrix sampling is ah important conSidera- ^ 

tias>n* Specifically, a partially balanced incomplete Block design is 

. ' - • * * * ■ / • ■* ' * 

preferably to a random allocation for sampling: plans having the same tfe # 

i * • 

- The superiority of the PB»IB*is most apparent in estimating the mean test 
score and becomes l6ss apparent in ^estimating higher central ^moments , 
This reinforces.>a conclusiqn made by Lord and Novick (1968) that in / 

• v / 

estimating the mean test score omitting even one item, has a drastic effect 
v ** + ' ' ° 

••#■•« • * 

on the standard error of estimate* In this investigation,, a PBIB design v 

guaranteed that each of the K itgms was included in some subtest. Such 

*» ■ . * * 

was *flot the case with a random, item allocation- where it yas quite 

- o • ? r 

possible for certain items 'to be omitted completely (.a's happened to 
item 2 in Procedure 1 in. Table >2) . The restilts indicate that the L<?rd 
.and Novick conclusion is applicable to higher central moments but the 
expected discrepancies are not as drastic as those expected with the 
mean test, score* , ' 4 

% *©f additional intfer^st in this investigation was the uss of the 
jackknife in approximating standard errors of estimate in multiple 
matrix sampling^ The results reinforce the conclusiqn ^drawn by 



Shoemaker (1972) th&t the jackknife can be used for this purpose and 
/ * » 

/ 

also £hed light bn a problem rilentioned therein* Shoemaker noted Chat 

. /*>■ '• .• • ' ' . 

the jackknife overestimated the standard ^rror of the mean test score > 

when a = .05 ani i^ems were allocated to subtests using a PBIB. design. 

w r . * \ ■ ' . 

_5h^resuJ.ts in £able3 suggest tha v t the inability* of the jackknife to 
pprforr^well frr^^ii^ case was a function of the item allocation procedure 

For the jackknife to be appropriate, the^pseudovalues must be independent 

• * . \* 

arid the results sugg^s,t that this requirement is violated with a PBIB 

design,. Regarding this violation, the jackknife is Dot as^ robust when 

estimating the standard error o*f the mean -test 'score as it is in 

est-iimatirtg standard* errors of higher central moments. The conclusion 

* + * 

*■ * 2 

seems warranted § that, ^hen'cr^ departs significantly from zero and ,a 

PBIB Resign is used tQ allocate items to subtests, the jackknife will 

approximate conservatively the standard 

matrix sampling* 



erro^ of estimate in multiple 
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« . . TABLE 1 , I 

-I 1 • / ^ * 

Procedures for Allocating Items to Subtests in Multiple Matrix Sampling 



Item Allocation 
Procedure 



\ 



Restrictions On tk 



Resgy jctjbns Oh 

V Sampling Of Items 



!<►, Random Sampling 



Partially / 
Balanced j 
Incomplete / 
Block- Design 
(not all 'items 
tested) \ 

Partially- f 
Balanced 
Incomplete 
Block Design 
(all items 
tested) 



None 



tk < K 



tk ^£ 

tk =\rK (r 



integer) 



Wi-thout ,repracement 
within, ejach subtest 

With '^placement 
among Subtests # / 

Withqut replacement 
within each subtest 

"Without ( replacement 
, amoh^s ub tes t s 

Without replacement 
withi * each subtest 

Each/of tbe items 
appears with ecfiial 
frequency (r) among # - 
subtests 



Balanced 
Incomplete 
Block Design 



tk > K 

tk = rK (r integer) 

ClC k - 1 

(X integer) 



Without re vlacement 
within each subtest 

Each of the K(K - l)/2 
item pairings appears 
w^Lth equal frequency * 
among subtests 
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• / TABLE 2 

Examples of Subtests- Resulting From the Four Item.Allocation 
Procedures- Described in Table 1 Using k * 3* and K - 7> 
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Pnggedure 1 f Procedure 2 Procedure 3 Procedure-4 
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